Group By Operations in Pandas

pandas
dataframe
group-by
aggregation
Master Pandas groupby operations: splitting data by categories, applying functions, and combining results. Learn aggregation, transformation, and filtering techniques for data analysis.
Author

Mohammed Adil Siraju

Published

September 21, 2025

GroupBy operations are one of the most powerful features in Pandas for data analysis. They allow you to:

This notebook covers essential groupby techniques including aggregation functions, multiple aggregations, and advanced operations.

1. Setting Up Sample Data

Let’s create a sample dataset to demonstrate groupby operations. We’ll work with categorical data and numerical values.

import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)

2. Basic Aggregation Functions

Groupby operations allow you to calculate summary statistics for each group. Here are the most common aggregation functions:

Sum Aggregation

Calculate the total sum of values for each category:

df.groupby('Category').sum()
Value
Category
A 60
B 40

Mean Aggregation

Calculate the average value for each category:

df.groupby('Category').mean()
Value
Category
A 20.0
B 20.0

Median Aggregation

Calculate the median (middle) value for each category:

df.groupby('Category').median()
Value
Category
A 20.0
B 20.0

Maximum Values

Find the highest value in each category:

df.groupby('Category').max()
Value
Category
A 30
B 25

Minimum Values

Find the lowest value in each category:

df.groupby('Category').min()
Value
Category
A 10
B 15

Standard Deviation

Measure the spread of values within each category:

df.groupby('Category').std()
Value
Category
A 10.000000
B 7.071068

Variance

Calculate the variance (squared standard deviation) for each category:

df.groupby('Category').var()
Value
Category
A 100.0
B 50.0

3. Multiple Aggregations

You can apply multiple aggregation functions at once using the agg() method. This provides a comprehensive view of your grouped data.

Applying Multiple Functions

Calculate sum, mean, and maximum for each category in one operation:

df.groupby('Category').agg(['sum', 'mean', 'max'])
Value
sum mean max
Category
A 60 20.0 30
B 40 20.0 25

Summary

GroupBy operations are essential for data analysis in Pandas. In this notebook, you learned:

πŸ”’ Basic Aggregation Functions

  • sum(): Total values per group
  • mean(): Average values per group
  • median(): Middle value per group
  • max() / min(): Highest/lowest values per group
  • std() / var(): Measure spread within groups

πŸ“Š Advanced Operations

  • agg(): Apply multiple functions simultaneously
  • Combine statistics for comprehensive group analysis

πŸ’‘ Key Concepts

  1. Split-Apply-Combine: The three-step process of groupby operations
  2. Aggregation: Reducing groups to single values (sum, mean, etc.)
  3. Multiple Functions: Use agg() for comprehensive summaries

πŸš€ Best Practices

  • Choose appropriate aggregation functions for your data type
  • Use multiple aggregations to get complete group insights
  • Consider data distribution when selecting measures (mean vs median)

πŸ“ˆ Next Steps

  • Explore groupby with multiple columns
  • Learn filtering and transformation operations
  • Practice with real datasets for business insights

Mastering groupby operations will significantly enhance your data analysis capabilities! 🎯